Search CORE

273 research outputs found

Dense Voxel 3D Reconstruction Using a Monocular Event Camera

Author: Chen Haodong
Chen Xiaoming
Chung Vera
Tan Li
Publication venue
Publication date: 01/09/2023
Field of study

Event cameras are sensors inspired by biological systems that specialize in capturing changes in brightness. These emerging cameras offer many advantages over conventional frame-based cameras, including high dynamic range, high frame rates, and extremely low power consumption. Due to these advantages, event cameras have increasingly been adapted in various fields, such as frame interpolation, semantic segmentation, odometry, and SLAM. However, their application in 3D reconstruction for VR applications is underexplored. Previous methods in this field mainly focused on 3D reconstruction through depth map estimation. Methods that produce dense 3D reconstruction generally require multiple cameras, while methods that utilize a single event camera can only produce a semi-dense result. Other single-camera methods that can produce dense 3D reconstruction rely on creating a pipeline that either incorporates the aforementioned methods or other existing Structure from Motion (SfM) or Multi-view Stereo (MVS) methods. In this paper, we propose a novel approach for solving dense 3D reconstruction using only a single event camera. To the best of our knowledge, our work is the first attempt in this regard. Our preliminary results demonstrate that the proposed method can produce visually distinguishable dense 3D reconstructions directly without requiring pipelines like those used by existing methods. Additionally, we have created a synthetic dataset with

39,739

object scans using an event camera simulator. This dataset will help accelerate other relevant research in this field

arXiv.org e-Print Archive

Fine-grained Activity Classification In Assembly Based On Multi-visual Modalities

Author: Chen Haodong
Leu Ming-Chuan
Yin Zhaozheng
Zendehdel Niloofar
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2023
Field of study

Assembly activity recognition and prediction help to improve productivity, quality control, and safety measures in smart factories. This study aims to sense, recognize, and predict a worker\u27s continuous fine-grained assembly activities in a manufacturing platform. We propose a two-stage network for workers\u27 fine-grained activity classification by leveraging scene-level and temporal-level activity features. The first stage is a feature awareness block that extracts scene-level features from multi-visual modalities, including red, green blue (RGB) and hand skeleton frames. We use the transfer learning method in the first stage and compare three different pre-trained feature extraction models. Then, we transmit the feature information from the first stage to the second stage to learn the temporal-level features of activities. The second stage consists of the Recurrent Neural Network (RNN) layers and a final classifier. We compare the performance of two different RNNs in the second stage, including the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The partial video observation method is used in the prediction of fine-grained activities. In the experiments using the trimmed activity videos, our model achieves an accuracy of \u3e 99% on our dataset and \u3e 98% on the public dataset UCF 101, outperforming the state-of-the-art models. The prediction model achieves an accuracy of \u3e 97% in predicting activity labels using 50% of the onset activity video information. In the experiments using an untrimmed video with continuous assembly activities, we combine our recognition and prediction models and achieve an accuracy of \u3e 91% in real time, surpassing the state-of-the-art models for the recognition of continuous assembly activities

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

Author: Chen Kai
Duan Haodong
Lin Dahua
Xiong Yuanjun
Zhao Yue
Publication venue
Publication date: 19/09/2022
Field of study

Deep learning models have achieved excellent recognition results on large-scale video benchmarks. However, they perform poorly when applied to videos with rare scenes or objects, primarily due to the bias of existing video datasets. We tackle this problem from two different angles: algorithm and dataset. From the perspective of algorithms, we propose Spatial-aware Multi-Aspect Debiasing (SMAD), which incorporates both explicit debiasing with multi-aspect adversarial training and implicit debiasing with the spatial actionness reweighting module, to learn a more generic representation invariant to non-action aspects. To neutralize the intrinsic dataset bias, we propose OmniDebias to leverage web data for joint training selectively, which can achieve higher performance with far fewer web data. To verify the effectiveness, we establish evaluation protocols and perform extensive experiments on both re-distributed splits of existing datasets and a new evaluation dataset focusing on the action with rare scenes. We also show that the debiased representation can generalize better when transferred to other datasets and tasks.Comment: ECCVW 202

arXiv.org e-Print Archive

Experimental study on thermal runaway risk of 18650 lithium ion battery under side-heating condition

Author: Chen Haodong
Li Huang
Wang Qingsong
Wang Yu
Zhong Guobin
Publication venue: 'Elsevier BV'
Publication date: 01/09/2019
Field of study

Edinburgh Research Explorer

The Efficiency of Dodecafluoro-2-Methylpentan-3-One on Suppressing the Lithium Ion Battery Fire

Author: Chen Haodong
Duan Qiangling
Li Ke
Sun Jinhua
Wang Qingsong
Wang Yu
Publication venue: 'ASME International'
Publication date: 11/04/2018
Field of study

Crossref

Edinburgh Research Explorer

When Urban Region Profiling Meets Large Language Models

Author: Chen Haodong
Chen Wei
Liang Yuxuan
Wen Haomin
Wen Qingsong
Yan Yibo
Zhong Siru
Zimmermann Roger
Publication venue
Publication date: 21/10/2023
Field of study

Urban region profiling from web-sourced data is of utmost importance for urban planning and sustainable development. We are witnessing a rising trend of LLMs for various fields, especially dealing with multi-modal data research such as vision-language learning, where the text modality serves as a supplement information for the image. Since textual modality has never been introduced into modality combinations in urban region profiling, we aim to answer two fundamental questions in this paper: i) Can textual modality enhance urban region profiling? ii) and if so, in what ways and with regard to which aspects? To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP). Specifically, it first generates a detailed textual description for each satellite image by an open-source Image-to-Text LLM. Then, the model is trained on the image-text pairs, seamlessly unifying natural language supervision for urban visual representation learning, jointly with contrastive loss and language modeling loss. Results on predicting three urban indicators in four major Chinese metropolises demonstrate its superior performance, with an average improvement of 6.1% on R^2 compared to the state-of-the-art methods. Our code and the image-language dataset will be released upon paper notification

arXiv.org e-Print Archive

Study of point-supported glass breakage behavior with varying point-covered areas under thermal loading

Author: Chen Haodong
Duan Qiangling
Jiang Lin
Lu Wei
Sun Jinhua
Wang Qingsong
Wang Yu
Publication venue: 'Elsevier BV'
Publication date: 05/06/2018
Field of study

Edinburgh Research Explorer

Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance

Author: Chang Zhuoqing
Chen Haodong
Hajmohammadi Solmaz
Leu Ming C.
Moniruzzaman Md
Yin Zhaozheng
Publication venue
Publication date: 15/08/2023
Field of study

Repetitive counting (RepCount) is critical in various applications, such as fitness tracking and rehabilitation. Previous methods have relied on the estimation of red-green-and-blue (RGB) frames and body pose landmarks to identify the number of action repetitions, but these methods suffer from a number of issues, including the inability to stably handle changes in camera viewpoints, over-counting, under-counting, difficulty in distinguishing between sub-actions, inaccuracy in recognizing salient poses, etc. In this paper, based on the work done by [1], we integrate joint angles with body pose landmarks to address these challenges and achieve better results than the state-of-the-art RepCount methods, with a Mean Absolute Error (MAE) of 0.211 and an Off-By-One (OBO) counting accuracy of 0.599 on the RepCount data set [2]. Comprehensive experimental results demonstrate the effectiveness and robustness of our method.Comment: 6 pages, 9 figure

arXiv.org e-Print Archive